Scripting distributed scientific workflows using Weaver

نویسندگان

  • Peter Bui
  • Li Yu
  • Andrew Thrasher
  • Rory Carmichael
  • Irena Lanc
  • Patrick Donnelly
  • Douglas Thain
چکیده

ion Description Run(function, inputs) Apply inputs to function. Map(function, inputs) For each input in inputs, apply function. MapReduce(mapper, reducer, inputs) For each input in inputs apply mapper, sort intermediate outputs, and then apply reducer. AllPairs(function, inputs_a, inputs_b) Apply function to all combinations of inputs_a and inputs_b. Wavefront(function, matrix) Compute recurrence relation by applying function to matrix in a wave pattern. Copyright © 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2012; 24:1685–1707 DOI: 10.1002/cpe SCRIPTING DISTRIBUTED SCIENTIFIC WORKFLOWS USING WEAVER 1691 inputs to generate the outputs. Although very simple, this abstraction is useful for constructing more complicated patterns of execution. 2.3.2. Map. The Map abstraction is a common pattern used for work that exhibits data parallelism. Map in an input function, which is applied to each item in the input dataset. The results of each function application is stored in a collection of output data objects or as a single data file if the user specifies an output target. Because each function application is independent of other function executions, the individual tasks in this pattern are data parallel and thus can be executed concurrently. 2.3.3. MapReduce. Another common data-processing abstraction provided by Weaver is MapReduce [1]. In this pattern, a mapper function is applied to the initial set of inputs to generate a group of intermediate output files that are partitioned, sorted, and then passed to the reducer function for aggregation. All the tasks in both the mapper and reducer phases exhibit data independence and therefore can be run in parallel. 2.3.4. All-Pairs. An abstraction that is frequently used in fields such as biometrics and data-mining is All-Pairs [2]. In this pattern of work, each member of one dataset is compared with each member of another dataset to produce a matrix that contains the scores for each comparison. Like the previous abstractions, the individual comparison tasks can execute independently of each other, which allows the jobs to be scheduled to run concurrently. 2.3.5. Wavefront. An abstraction used in game theory and gene sequencing applications is Wavefront [13], which computes a two-dimensional recurrence relationship where each cell in the output matrix is generated by a function whose arguments are the values in the cell immediately to the left, below, and diagonally left and below. Although some cells can be processed in parallel, due to the recurrence relationship, special care must be taken to ensure the proper ordering of dependent cell computations. By default, Weaver includes all five of these abstractions as a part of the framework. However, because abstractions are just normal Python functions, it is possible for users to extend the existing ones or define their own abstractions specific to their workflow.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Scientific Workflows for Science and Engineering Optimisation

Scientific workflows have been applied to a wide range of problems from science and engineering to ecology. They deliver infrastructure that simplifies scripting complex distributed experiments. For example, data may be sourced from one or more locations, and used to drive a pipeline of computational models. Processing steps may vary from simple-minded data reformatting and pre-processing, whic...

متن کامل

Tool Integration with the Evidential Tool Bus

Formal and semi-formal tools are now being used in large projects both for development and certification. A typical project integrates many diverse tools such as static analyzers, model checkers, test generators, and constraint solvers. These tools are usually integrated in an ad hoc manner. There is, however, a need for a tool integration framework that can be used to systematically create wor...

متن کامل

Approaches to Distributed Execution of Scientific Workflows in Kepler

The Kepler scientific workflow system enables creation, execution and sharing of workflows across a broad range of scientific and engineering disciplines while also facilitating remote and distributed execution of workflows. In this paper, we present and compare different approaches to distributed execution of workflows using the Kepler environment, including a distributed dataparallel framewor...

متن کامل

Constructing workflows from script applications

For programming and executing complex applications on grid infrastructures, scientific workflows have been proposed as convenient high-level alternative to solutions based on general-purpose programming languages, APIs and scripts. GridSpace is a collaborative programming and execution environment, which is based on a scripting approach and it extends Ruby language with a high-level API for inv...

متن کامل

Hydrologists workbench: A governance model for scientific workflow environments

Scientific workflows (SWF) are an emerging approach that enables scientists to compose and execute complex, distributed scientific processes. The approach is premised on the ability to compose, publish, share and reuse workflows across distributed communities of collaborating scientists. Scientific workflow software (SWFS) provides a technical framework to compose, publish, and reuse SWFs toget...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2012